40 research outputs found
Dynamic Analysis of Executables to Detect and Characterize Malware
It is needed to ensure the integrity of systems that process sensitive
information and control many aspects of everyday life. We examine the use of
machine learning algorithms to detect malware using the system calls generated
by executables-alleviating attempts at obfuscation as the behavior is monitored
rather than the bytes of an executable. We examine several machine learning
techniques for detecting malware including random forests, deep learning
techniques, and liquid state machines. The experiments examine the effects of
concept drift on each algorithm to understand how well the algorithms
generalize to novel malware samples by testing them on data that was collected
after the training data. The results suggest that each of the examined machine
learning algorithms is a viable solution to detect malware-achieving between
90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the
performance evaluation on an operational network may not match the performance
achieved in training. Namely, the CAA may be about the same, but the values for
precision and recall over the malware can change significantly. We structure
experiments to highlight these caveats and offer insights into expected
performance in operational environments. In addition, we use the induced models
to gain a better understanding about what differentiates the malware samples
from the goodware, which can further be used as a forensics tool to understand
what the malware (or goodware) was doing to provide directions for
investigation and remediation.Comment: 9 pages, 6 Tables, 4 Figure
Decomposing spiking neural networks with Graphical Neural Activity Threads
A satisfactory understanding of information processing in spiking neural
networks requires appropriate computational abstractions of neural activity.
Traditionally, the neural population state vector has been the most common
abstraction applied to spiking neural networks, but this requires artificially
partitioning time into bins that are not obviously relevant to the network
itself. We introduce a distinct set of techniques for analyzing spiking neural
networks that decomposes neural activity into multiple, disjoint, parallel
threads of activity. We construct these threads by estimating the degree of
causal relatedness between pairs of spikes, then use these estimates to
construct a directed acyclic graph that traces how the network activity evolves
through individual spikes. We find that this graph of spiking activity
naturally decomposes into disjoint connected components that overlap in space
and time, which we call Graphical Neural Activity Threads (GNATs). We provide
an efficient algorithm for finding analogous threads that reoccur in large
spiking datasets, revealing that seemingly distinct spike trains are composed
of similar underlying threads of activity, a hallmark of compositionality. The
picture of spiking neural networks provided by our GNAT analysis points to new
abstractions for spiking neural computation that are naturally adapted to the
spatiotemporally distributed dynamics of spiking neural networks
Tracking Cyber Adversaries with Adaptive Indicators of Compromise
A forensics investigation after a breach often uncovers network and host
indicators of compromise (IOCs) that can be deployed to sensors to allow early
detection of the adversary in the future. Over time, the adversary will change
tactics, techniques, and procedures (TTPs), which will also change the data
generated. If the IOCs are not kept up-to-date with the adversary's new TTPs,
the adversary will no longer be detected once all of the IOCs become invalid.
Tracking the Known (TTK) is the problem of keeping IOCs, in this case regular
expressions (regexes), up-to-date with a dynamic adversary. Our framework
solves the TTK problem in an automated, cyclic fashion to bracket a previously
discovered adversary. This tracking is accomplished through a data-driven
approach of self-adapting a given model based on its own detection
capabilities.
In our initial experiments, we found that the true positive rate (TPR) of the
adaptive solution degrades much less significantly over time than the naive
solution, suggesting that self-updating the model allows the continued
detection of positives (i.e., adversaries). The cost for this performance is in
the false positive rate (FPR), which increases over time for the adaptive
solution, but remains constant for the naive solution. However, the difference
in overall detection performance, as measured by the area under the curve
(AUC), between the two methods is negligible. This result suggests that
self-updating the model over time should be done in practice to continue to
detect known, evolving adversaries.Comment: This was presented at the 4th Annual Conf. on Computational Science &
Computational Intelligence (CSCI'17) held Dec 14-16, 2017 in Las Vegas,
Nevada, US
Neurogenesis Deep Learning
Neural machine learning methods, such as deep neural networks (DNN), have
achieved remarkable success in a number of complex data processing tasks. These
methods have arguably had their strongest impact on tasks such as image and
audio processing - data processing domains in which humans have long held clear
advantages over conventional algorithms. In contrast to biological neural
systems, which are capable of learning continuously, deep artificial networks
have a limited ability for incorporating new information in an already trained
network. As a result, methods for continuous learning are potentially highly
impactful in enabling the application of deep networks to dynamic data sets.
Here, inspired by the process of adult neurogenesis in the hippocampus, we
explore the potential for adding new neurons to deep layers of artificial
neural networks in order to facilitate their acquisition of novel information
while preserving previously trained data representations. Our results on the
MNIST handwritten digit dataset and the NIST SD 19 dataset, which includes
lower and upper case letters and digits, demonstrate that neurogenesis is well
suited for addressing the stability-plasticity dilemma that has long challenged
adaptive machine learning algorithms.Comment: 8 pages, 8 figures, Accepted to 2017 International Joint Conference
on Neural Networks (IJCNN 2017